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1. INTRODUCTION 

Three-dimensional (3D) modeling is getting more and more important nowadays especially in the 
games and movies industries to meet the audiences’ visual excitement. The most common modeling method 
is through using modeling software. The frequently used window-based modeling software are 3D Blender, 
Wings 3D, Meshmixer, FreeCAD, 3D Slash, and Sculptris. These open-source software however, are more 
suitable for the beginners, except 3D Blender which can create more advance stuff. Other advance but not 
free modeling software are 3D Maya, Houdini 17.5, Cinema 4D R20, Autodesk 3ds Max, and Lightwave 3D. 
They are more suitable for creating advance and complex graphic models and special effects. The software 
employ modeling techniques like spline/non-uniform rational B-spline (NURBS) modeling, subdivision/box 
modeling, contour modeling, digital sculpting and surface modeling [1], to speedily construct high quality of 
3D models. In general, using the software to create 3D models still requires time, experiences and skills. 

Another modeling technique is the image-based modeling which input two-dimensional (2D) or 
Two-and-a-half-dimensional (2.5D) static images to construct a 3D model. The more input of images, the 
more refined detail of the model would be generated. However, the processing time to construct the 
geometric details from the images and the combination of the piecewise detail from each image is 
undoubtedly will be increasing. This has not covered the processing time for calibrating the image sequences 
for the purpose of denoising and improving the quality of the image. Nevertheless, the latest research work is 
heading towards the un-calibrated work [2] and the minimal use of number of images to speed up the 
construction of the 3D models. Constructing a 3D model from the scratch is a pain [3] to many modelers in 
games and entertainment industries. It is an agony to come up with a new model design for the fresh visual 
excitement. One of the solutions to this problem is by using the model blending method. It blends (or 
combines) two or more input of 3D models and outputs a new design of 3D model automatically. 
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Next section, we will review some prominent blending methods. In section 3, we will discuss some 
3D reconstruction techniques. Although processing the images is time-taking, this research area is getting 
more attention because if the number of images to be used reduces, it can shorten the processing time very 
much. In section 4, we will demonstrate a simple blending work and some results for visual understanding. 
Section 5 will discuss the future direction of the blending research work. 


2. 3D BLENDING METHODS 

Model shape blending and model shape morphing are two frequently overlapped terminologies used 
in many literatures. Technically, shape blending combines two or more model shapes to form a new shape. 
The new shape is expected to be the intermediate shape of all the inputs. For instance, in Figure 1, the long, 
slim and straight magenta object (left) is blended with a short, curvature cyan object (middle). This produces 
an average height, slight-curvature blue object (right). 


+ 


Figure 1. Two model shapes are blended to form a new shape 


Meanwhile, shape morphing is a transition of shapes from one source of model to a targeted model. 
The result will be a sequential of models. The midway transited model may be similar to the blended model 
shape. Though, this method can only transact two model shapes at one moment. Unlike shape blending 
method, it not only can combine more than two model shapes at one moment, the resultant shape may 
completely be different from the input shapes. Therefore, shape blending method is more preferred for 
creating new ideas in the design world; whereas shape morphing is more preferred for creating animation in 
the interactive world. In this section, we will review and analyze on a few remarkable shape blending 
methods. 

Jain et al. [4] analyzed each feature shape, its adjacency features and the symmetry features before 
combining parts to form an object. Their system performed hierarchical pairing between the shapes, 
combining parts which have almost the same positional information in the nodes of the hierarchy. The 
pairing is interpolated from coarse-to-fine parts and subsequently swaps and combines to produce new 
models. Their method does not involve any manual intervention. Though, the output is very much depended 
on the accuracy of the segmentation of the parts. Besides that, their algorithm needs to have equal matching 
parts. If one input has very little parts and the other input has a lot of parts, then, the final output will result to 
very little exchange of parts or might produce unpleasant models. 

Alhashim et al. [5] and Wu et al. [6] applied structural-based concept to blend two topology-varying 
input model shapes. Their blended result comprised of the basic structure (or functional) of the input models. 
When blending two input chairs, the output should remain to be a chair-like model. To achieve that, they 
shrank the input models into a skeletal representation. Then, each skeletal part was identified and made 
corresponded between the source and target models semi-automatically (user input was involved). Blending 
process was executed via topological events in different orders. The preserved structure would filter off 
implausible in-between graph. The remaining unfiltered in-between structural graphs are mesh-reconstructed 
through an inverse mapping from the skeletal representation to the surfaces model. The method still requires 
a few improvements such as accurate segmentation for meaningful parts, proper parts corresponding for 
creating a logical functional shape, and the input model shapes are limited to tubular and sheet-like parts 
(non-complex shapes). 

Huang et al. [7] interestingly blended two models to produce new stylized models. They inputted a 
base model which was purely a non-textured 3D geometric model, and a style model which was a textured 
3D geometric model. The base model would dominate the basic shape of the new model. For instance, input 
a “stylized” cow model and a “base” cup model will result to a cow-like cup of shape. For blending the two 
inputs, Huang et al. constructed a tree-growing data structure to filter off the implausible parts-combined 
models based on six different aspects. Then, based on the topology merging graphs, they applied soft 
matching to combine the corresponding parts. Their proposed method can blend any models from different 
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categories such as to blend a life object with a lifeless object, which is very different from previous works. 
Nevertheless, their method is complex and involves human intervention. 

Ong et al. [8] applied slinky-based segmentation method to auto-segment input models into 
functional parts. Both models’ parts were scaled, oriented and matched accordingly. Then, swapping of parts 
was executed to from one model to another model. Their method is simple but both models need to be 
equaled in number of parts. Also, their input models can only deal with the same object. 

Hua et al. [9] employed Hausdorff distance method to match parts from two input models. They 
proposed two approaches to blend the matched parts. The first approach simply interpolates the vertices and 
faces from one part to the corresponding part for creating in-between parts. The second approach 
parameterized the part-pair spherically to a combined mesh and interpolate the vertex in the combined mesh 
for in-between parts. The created variations of models are of the plausible shape and with fine functionality. 
Though, the approaches require same category of objects for the blending such as to blend a round table with 
a square table, and both input models must share the same number of segmented parts. 


3. 3D RECONSTRUCTION 

3D reconstruction can be heavily device dependent or alternatively, a less costly approach which 
rely more on economic data acquisition and data processing techniques. The former approach usually 
requires investment on expensive devices or sensors to achieve high accuracy while latter approach invites 
opportunities for improvements in processing algorithms to obtain acceptable results. Research can range 
from investigating affordable setups and instrument for data acquisition to designing frameworks to 
filter/manage/process data and reconstruct 3D surface. The discussion in this section will be categorized to 
modelling objects, human face and body/pose, and environment like room, terrain or building. 


3.1. Modeling objects 

With the rapid advancements in camera technology and computer processors, image-based modeling 
(IBM) is gaining momentum. Rather than investing on costly devices like coordinate measuring machines 
(CMM) or laser scanners to achieve high accuracy, researchers invest their effort in investigating approaches 
to perform 3D reconstruction using more affordable and easily available hardware like consumer digital 
cameras, webcams or even build in cameras on mobile phones. Shujaa and Abdulmajeed [10] applied neural 
network (NN) technique for the depth estimation process. 

Some researchers use red, green and blue color (RGB-D) sensors like Microsoft Kinect for 3D 
reconstructions, where Kinect v1 and v2 each uses structured light (SL) and time of flight (ToF). The former 
deducts the distance based on the infrared (IR) pattern’s distortion on the object’s surface while the latter 
captures the time used for light to travel to the object surface and reflected to the receiver. Kinect v1 is 
suitable for indoors only as it does not perform well under bright sunlight [11] and the accuracy decreases 
exponentially with increasing distance [12]. Yang et al. [13], Kinect has few advantages, including the ability 
to perform well under low light conditions, to resolve pose silhouette ambiguities, and is color and texture 
invariant. 

According to Durou et al. [14], 3D reconstruction techniques from digital cameras can be classified 
as either geometric or photometric and the number of images required is either single or multiple, as show in 
Table 1. Geometric shape-from-X techniques identify features from the images, while photometric shape- 
from-X techniques analyze the quantity of light received in each photosite of the camera’s sensor. 


Table 1. Main shape-from-X techniques [14] 


Geometric techniques Photometric techniques 
Single image Structured light Shape-from-shading (SfS) 
Shape-from-shadows 
Shape-from-contours 
Shape-from-texture 
Shape-from-template 
Multi-images Structure-from-motion Photometric stereo 
Stereopsis Shape-from-polarisation (SfP) 
Shape-from-silhouettes 
Shape-from-focus 


Boora et al. [2] attempted to perform 3D reconstruction with images obtained from mobile phones. 
Tracking was done with Kanade-Lucas-Tomasi (KLT) feature tracking algorithmand then the feature points 
were projected to approximate the 3D world point. Nevertheless, the authors were unable to obtain the actual 
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size of object. On the other hand, Ng [15] performed depth value approximation with Lucas Kanade optical 
flow and trigonometry. Images of 640x480 pixels were taken using webcam and the actual size of objects 
were able to be approximated in a calibrated environment. 

Yamada and Kimura [16] evaluated the performance of two famous keypoint detection/feature 
descriptor used for 3D reconstruction, namely scale invariant feature transform (SIFT) [17] and accelerated- 
KAZE (AKAZE) [18]. They found that AKAZE performed slightly better than SIFT, but sufficient 3D 
reconstruction were not obtained if only the descriptors were used separately, and hence, they proposed the 
combination of both to obtain better 3D reconstruction results. 

Most of the times, when performing IBM, problems like hole or noise will occur. Holes might be 
due to lack of features to track, or part(s) of the subject was occluded. Awang et al. [19] proposed enhanced 
advancing front mesh (EAFM) method to overcome the problem of finding new point in a hole of missing 
area. The enhanced method was an improvement of the original advanced front method (AFM) and their 
results was able to introduce more points in a hole to represent better features of the object. On the other 
hand, problem will occur in active systems like laser scanner when trying to obtain 3D points of surfaces 
which do not reflect light, such as black or transparent surfaces. 


3.2. Modeling human 

Human face modeling with Microsoft Kinect has also lots of room for research, especially after the 
release of Kinect v2 in 2014. Wasenmiiller and Stricker [12] have evaluated that Kinect v2 outperformed 
Kinect v1 in terms of usage in 3D reconstruction, simultaneous localization and mapping (SLAM) or visual 
odometry. Human reconstruction using Kinect basically can be either focusing on the face only [20], or on 
the body/pose [21]. According to Zheng et al. [22], research findings in image-based reconstruction of a 
human body can be useful for virtual reality (VR) and augmented reality (AR) content creation. They 
proposed a deep-learning based framework to reconstruct 3D human model using a single image. 


3.3. Modeling environment 

When talking about indoor reconstructions, 3D room can be obtained either using Microsoft Kinect 
[23]-[25] or image [26]. To overcome the problem of visual attributes’ limitations such as transparent and 
highly reflective objects such as windows or mirrors, Kim et al. [27] proposed the combined usage of audio- 
visual data to complement the visual limitations. They captured the environment with 360° cameras and 
recorded acoustic room impulse responses (RIRs) with a loudspeaker and compact microphone array. Depth 
information of the scene is recovered by stereo matching from the captured images and estimation of major 
acoustic reflector locations from the sound. 

Meanwhile, for modeling large outdoor areas such as buildings or terrains, light detection and 
ranging (LiDAR) will be used. Its instrument fires up to 150,000 pulses per second of laser light at a surface. 
Like ToF cameras, the distance is deducted by the amount of time it takes for each pulse to bounce back to 
the instrument. Generally, LiDAR can be categorized into airborne LiDAR and ground-based LiDAR. 
Airborne LiDAR can provide higher levels of detail and hence is a more popular source of terrain mapping 
[28]. Their usages include forestry management and planning, oil and gas exploration, archaeology and many 
more. Meanwhile, ground-based LIDAR is used for more localized reconstructions, such as building 
reconstruction and navigation for autonomous vehicles. Yang and Fan [29] used 3D LiDAR points to 
reconstruct buildings. The filtered data is processed using random sample consensus (RANSAC) algorithm 
and region-growing algorithm for plane detection. Finally, the authors reconstruct the 3D building using the 
four corner points of the plane. On the other hand, Ashraf et al. [30] implemented synthetic aperture radar 
(SAR) for reconstruction by processing of radar echoes based on adaptive orthogonal matching pursuit 
(OMP) compression sensing. 

On the other hand, building and terrain modeling can also be performed using ariel images. Ban 
et al. [31] utilize height map generated from a satellite image and then perform refinement of 3D mesh to 
remove stair shape on the side of buildings. Meanwhile, Sugiura et al. [32] proposed high quality 3D surface 
reconstruction, specially for man-made structures, as triangular meshes by combining 3D line segments with 
the point cloud obtained from Structure from motion (SfM). 


4. ANALYSIS ON THE POTENTIAL RESEARCH PROBLEMS IN 3D BLENDING WORK 

This section will discuss the potential problems while working on the 3D blending research work. 
The examples given in this section are our implemented interpolation of geometric model blending using the 
Laplacian-based contraction and Slinky-based segmentation method [33]-[34]. Due to limited paper length, 
readers may refer to the cited papers for more information on the contraction and segmentation methods. The 
following will focus only on the blended-related output results. 
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Most of the blending techniques require input objects to be accurately segmented for functional 
parts [35]-[36], so that their appearance is plausible and meaningful for latter blending activity. Figure 2 
exhibits the quality of the model segmentation. Figure 2(a) shows two dogs, which the (left) dog is poorly 
segmented and the (right) dog is appropriately segmented. The poor segmented model will lead to many 
implausible parts. The over-segmented parts will result to open mesh which will create a lot of unwanted 
holes. Figure 2(b) shows the comparison of our proposed method to the existing seven prominent methods. 
The details of the seven methods are discussed in [33]. Accurate segmentation plays a very crucial role to 
ensure the correctness at the preliminary stage. Otherwise, the following processes and output results will all 
be deviated. 

Part-matching is another crucial criterion to take into account. Many existing blending methods 
(except the method proposed by Huang et al. [7]) require the input models to be the similar type and same 
number of segmented parts. Figure 3 exhibits the input model type. Figure 3(a) shows the same type and 
same number of segmented parts. If both the models do not share the same number of segmented parts (e.g. 
Figure 3(b)), then, the model with more parts shall merge some of the parts. This stage will require some 
intelligences to choose the most suitable parts to merge. For Figure 3(b) right hand model, the best choice is 
merging the palm with the wrist, and not merging the palm with any finger. This can be determined via 
skeleton analysis which is not within the scope of our discussion here. 


h fa Rand-Cuts Shape-Diam Norm-Cuts Core-Extra Rand-Cuts Shape-Diam Nom-Cuts Core-Extra 


Rand-Walks Fit-Prim K-Means Our Method Rand-Walks Fit-Prim K-Means Our Method 


(a) (b) 


Figure 2. Quality of the model segmentation (a) poor-segmented left-dog model and well-segmented right- 
dog model and (b) comparison between our proposed method with seven existing prominent methods 


WN 


Figure 3. Input model type (a) both input models are of same type and same number of segmented parts and 
(b) both models are of same type but with different number of segmented parts 


Part-blending process or part-swapping process does not usually produce obvious weird result. They 
are simply interpolated (or exchange) from one source part to the targeted part. Figure 4 exhibits the blended 
result. Figure 4(a) shows two legs of the chairs are blended. The source leg is thick and slanted; the target leg 
is slim and straight. The blended leg is less thick and is slightly slanted. The result is plausible, but a 
straighter leg is preferred for the stability purpose. Figure 4(b) shows a blended oval/diamond from a cuboid 
and a sphere/diamond. Technically, the orientation of the target part should be preserved. Only the size or the 
shape of the part is modifiable. 

The last critical activity of the blending process is to ensure the blended part attach properly or 
realistically to the original model. In Figure 5, it shows that the blended leg is not properly attached to the 
seat of a chair. Though, this is not a difficult problem. One can simply find the centric coordinates of the 
seat-hole and the blended-leg-hole, and then translate and orientate the blended-leg hole to merge with the 
Seat-hole. 
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Figure 4. Blended result (a) source part (left), target part (middle), blended leg (right) and (b) source part 
(left), target part (middle), blended oval/diamond (right) 
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Figure 5. Blended leg is not properly attached to the model (bottom view) 


5. FUTURE DIRECTION OF 3D BLENDING 
Many proposed blending methods are either semi-automatic or involve some human intervention to 
pair the functional parts. This has not included the human intervention in segmenting the models. At present, 
computer is still unable to imitate human brain to determine the orientation of various models, so that it can 
properly align and match the parts accurately. Many existing methods applied structured link among the 
segmented parts and tried blending the parts hoping to form a logical model. Deep learning approach is one 
of the potential areas for solving this problem. So far, not many deep learning approaches have been applied 
in three-dimensional field. This may deal to the already complicated neural network in two-dimensional field. 
The extension may exponentially burden the memory and the computation performance. 

Blending method is gradually widely applied to many industrials such as manufacturing industry, 
civil construction industry, automobile industry, and entertainment industry. by blending existing 
products/models for different products/models design. The input models are no longer the simple model 
meshes and/or small object size, but could be more than billion points of coordinates and/or huge and 
complex 3D screen. The used of graphics processing unit (GPU) may be insufficient to handle its bottleneck. 


6. CONCLUSION 


This paper reviews a few prominent 3D blending methods and 3D reconstruction methods for 
creating new or in-between models. It has also discussed some potential blending problems to solve at 
present time and some future direction of the model blending activities in the industries. A blending program 
is developed to illustrate the results and the potential problems. The 3D segmented result is to be compared 
with some existing prominent methods. Some blended results are demonstrated using the primitive models. 
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